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Abstract 

We study adaptation of a haploid asexual population on a fitness landscape defined over binary genotype sequences of length L. 
We consider greedy adaptive walks in which the population moves to the fittest among all single mutant neighbors of the current 
genotype until a local fitness maximum is reached. The landscape is of the rough mount Fuji type, which means that the fitness 
value assigned to a sequence is the sum of a random and a deterministic component. The random components are independent and 
identically distributed random variables, and the deterministic component varies linearly with the distance to a reference sequence. 
The deterministic fitness gradient c is a parameter that interpolates between the limits of an uncorrelated random landscape (c = 0) 
and an effectively additive landscape (c —> oo). When the random fitness component is chosen from the Gumbel distribution, 
explicit expressions for the distribution of the number of steps taken by the greedy walk are obtained, and it is shown that the walk 
length varies non-monotonically with the strength of the fitness gradient when the starting point is sufficiently close to the reference 
sequence. Asymptotic results for general distributions of the random fitness component are obtained using extreme value theory, 
and it is found that the walk length attains a non-trivial limit for L — » oo, different from its values for c - 0 and c = oo, if c is scaled 
with L in an appropriate combination. 
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1. Introduction 


Ever since th e concept of th e fitness landscape was intro¬ 
duced by Sewall WrighJ ('l932h. it has played a central role in 
evolutionary biology (Ide Visser and Kru3. 20141) . Among the 


different variants of the concept used in the literature, we here 
restrict ourselves to fitness landscapes that map the genotype 
space into the real numbers by assigning a fitness value to ev¬ 
ery genotype. With this definition, the fitness landscape pro¬ 
vides an intuitive picture of evolution as a hill-climbing pro¬ 
cess. A convenient choice for the genotype space is the L- 
dimensional hypercube {0,1 )^, which contains all binary se¬ 
quences C - (1,0,1,...,1,1) of length L. Rather than spec¬ 
ifying the genome on the level of DNA base pairs, the binary 
sequences keep track of the presence or absence of mutations 
compared to a wild-type genome, or (in a more coarse-grained 
representation) the presence or absence of entire genes. 

In addition to the underlying fitness landscape, the dynam¬ 
ics of adaptation is governed by the population size N and the 
mutation rate U per genome, both of which are to be compared 
to the scale of fitness differences summarized in a typical se¬ 
lection coefficient i. In the strong selection / weak mutation 
(SSWM) regime characterized by the conditions Ns » I and 
NU <sc I the population is monomorphic for most of the time, 
and the adaptive process is guided by th e landscape structure 
in a simple way ( Gillesni^ 19831 19841) . If a mutation to a 


fitter genotype occurs it has a nonzero probability of fixing in 
the population, whereas a mutation to a sequence with lower 
fitness is certain to go extinct. The low mutation rate makes 
it very unlikely for double mutations to occur. Accordingly, 
in this regime the population behaves as a point in sequence 
space that moves uphill in the fitness landscape by si ngle muta¬ 
tional steps, a process referred to as an a daptive walk (IGillesnie , 
1983[ 1984 ; Kauffman and Levin , 1987 ). An obvious feature of 
adaptive walks is that they end on a fitness maximum, that is, a 
genotype without fitter one-mutant neighbors. This makes the 
walk length, the number of steps until a maximum is reached, a 
property of interest. 

A simplified version of the adaptive walk problem where 
the effect of mutant fitness on the fixation probability of ben¬ 
eficial mutations is neglected and any neighboring genotype 
of hi gher fitness can fix with eq ual probability was stud¬ 
ied by Macken and PerelsonI ( 1989h and Elvvbierg and Lautrunl 
( 1992h . Eor rugged landscapes without fitness correlations 
the mean number of steps of such ‘random’ adaptive walks 
was found to be of the order of InL. When the effect of 
the fixation probability is incorporated the mean walk length 
is still logarithmic in the number of loci, but the coefficient 
of In L becomes dependent on the distribution of fitness val 


ues ( Gillespiej 1983t Onj 2 002t Neidhart and Krug . 2011 : Jaini 
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2011; Seetharaman and JainL l201 1 1. If the infinite L limit is 
taken the walks no longer terminate and adaptation can be stud¬ 
ied through the unbounded incre ase of the mean fitness of the 
population ( Park and Krul 2008). 

When the population size is increased beyond the SSWM 
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regime, the number of segregating sites becomes larger 
than two. In asexual populations this implies that two 
beneficial mutations compete with each other for fixation 
and the one with the larger fitness will be fixed pref- 


Robertson effect (Hill and RobertsorJ. 19661) and is commonlv 

known as clonal interference (Gerrish and Leu 

ski. 1998; Wilke 

2004 

; Park and Krugl 2007; Desai and Eisheil 

20071 Park et al. 

201C 

). A rough criterion for the clonal interference regime is 


provided by the condition NU In » 1. If we denote the mean 
fixation time in this regime by To (which depends on N, U, and 
s), almost all beneficial single mutant neighbors of the most 
populated genotype will be present during the fixation process 
if NUTq » L. To model this regime by an adaptive walk, we 
use a deterministic rule for the next step; the walker chooses 
the genotype with the largest fitness among the sequences that 
are one mutation away. This kind of ada ptive walk was c alled 


a ‘perfect’ or ‘ gradient’ adaptive walk bvlOrri (120021120031) . but 


here we follow iKauffman and LevinI (Il987h in referring to it as 


a greedy walk. lOrri (120031) calculated the length of a greedy 


adaptive walk on an uncorrelated fitness landscape using an or¬ 
der statistics approach that is independent of the fitness distri¬ 
bution, provided it is continuous. In the limit L ^ oo the mean 
walk length is given by e - 1 1.72, which was suggested to 

be a lower bound on the mean numb er of steps for any adap¬ 
tive walk [see also Rosenberg ( 2005h l. Note that for this de¬ 


scription to faithfully represent adaptation under strong clonal 
interference, the mutation rate has to be small e nough such that 


the cre ation of double mutants can be neglected (ISzendro et al 
2013ah. 


The studies of adaptive walks mentioned above were based 
on the assumption of an uncorrelated random fitness land¬ 
scape with maxima l ruggedness , which is not supported b 


de Visser and Kru3. 


empirical evidence ( Miller et all 1201 It ISzendro et al.L 12013 


i 


2014i) . The effect of fitness correlations 


on adaptive walks has so far been addressed mostly in the 
context of ‘block model’ landscapes in which the genotype 
is subdivided into independent modules, each of which is as¬ 
signed a random fit ness, and the mean walk length is ad¬ 
ditive over modules ( _Pgrelson_andMacken, 19951: On, 2006 : 


Seetharaman and JainL 120141: iNowak and Krugl 120151) . Here 
we consider greedy adaptive walks on another class of tunably 
rugged fitness landscapes, the rough mount Fuji (RMF) model, 
which was originally int roduced in the context of protein evo¬ 
lution ( Aita et al. . 2000h . In the RMF model an uncorrelated 
random fitness landscape is superimposed on a linear fitness 
gradient, and the slope of this gradient serves as a tuning pa¬ 
rameter controlling the ruggedness of the landscape. 

The RMF model has recently been found to provide a 
conv enient parametriza t ion of many empirical fitness data 
sets (Franke et al.. 2011: Szendro et all 2013b: Neidhart et all 


20131) . while at the same time allowing for detailed math- 
emat ical analysis of a wide range of lan dscape proper¬ 


ties ( Neidhart et al. . 2014; Park et al. . 20L5h . Of particu 


lar interest for our work are the results on the existence of 
selectively accessible mutational pathways, defined here as 
pathways to the global fitness maximum along which fit¬ 


ness increases monotonically and which are moreover di¬ 
rected, in the sense that the distance to the global optimum 


decre as es in each step (IWeinreich et al.L l2005t iFranke et al 


201 ih . Hegartv and Martinsson ( 20141) have shown that such 


pathways exist in the RMF model with a probability approach¬ 
ing unity for L — > oo, whereas this probability tends to zero 
for uncorrelated landscapes. A population following a directed 
accessible pathway would perform an adaptive walk of 0(L) 
steps, much longer than the walks on uncorrelated landscapes. 
However, the biological significance of accessible paths is not 
evident, bec ause an evolving population may not find them even 
if they exist ( Szendro et all 2013al : Park et all 2015 ). 

In this paper, we study greedy adaptive walks on the RMF 
fitness landscape, focusing on the mean number of steps when 
L is very large. For a specific choice of the distribution of the 
random fitness component in the RMF model we obtain an ana¬ 
lytic solution for the full distribution of walk lengths and show 
that it attains a non-degenerate li mit for L —> oo, similar to Orr’s 
analysis of the uncorrelated case (On, 2003h . We also consider 
the dependence of the walk length on the distance of the ini¬ 
tial genotype from the reference state, and show that in a range 
of distances the walk length varies non-monotonically with the 
strength of the fitness gradient. 

Arbitrary distributions of the random fitness component 
can be treated in the limit L —> oo by exploiting the 
convergence of the maximum of L random variables to 
one of the universal distrib u tions of extreme value theory 
(EVT) ( de Haan and Ferreiral . 20061). The E VT approach to 
adaptation was pioneered by Gillespie ( 19841) and On (2002) 
and has meanwhile become an established conceptual frame¬ 
work that allows to organize and quantify the relation between 
the distribution of mutational effects and the conesponding 
adapt i ve behavior dl oyceet^ , 20081 ^lOt Rokvta et al 


200^ Schenk et all l201^ Bank et al. . 20141) . Similar to the 
analysis of fitness landscape pr operties for the RME model pre¬ 
sented by Neidhart et al.l ( 2014 ), we find that the behavior of the 
walk length is governed by the interplay between the rugged¬ 
ness parameter and the tail properties of the distribution of the 
random fitness component. Specifically, if the tail of the distri¬ 
bution is fatter than exponential, the walk length reverts to the 
behavior found by Orr for uncorrelated landscapes for any fixed 
value of the fitness gradient. On the other hand, for tails thinner 
than exponential the effective strength of the fitness gradient in¬ 
creases without bound with increasing L, such that the greedy 
walks traverse the entire landscape with high probability for 
L —> oo. A non-trivial limit of the walk length is attained only 
when c and L are scaled together in a particular combination. 

2. Definitions 

The RME fitness landscape is constructed from an additive 
‘mount Euji’ fitness landscape by adding an independent and 
identically distributed (i.i.d.) random variable to the fitness of 
every genotype. By C we denote a binary sequence of length 
L which represents the genotype. In particular, we will call the 
sequence Cr - (1,1,..., 1) the reference sequence which has 
the largest fitness in the purely additive landscape. Its antipodal 
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point on the hypercube, the sequence with all elements 0, will 
be denoted by Ca- The fitness of a sequence C in the RMF 
fitness landscape is then assigned as 

W(C)^-cdr(C)+^C, ( 1 ) 

where dr{C) is the Hamming distance between C and the refer¬ 
ence sequence Cr, c is a positive real number, and {^c) i.i.d. 
random variables with probability density /(^) and cumulative 
distribution function F(^), defined as 


that at least one accessible path from Cn to Cr exists converges 
to uni ty as L —> oo for any finite c > 0 (IHegartv and Martinsson , 


20141) . If the greedy walker takes such a path with probabil¬ 


ity of (9(1), the mean number of steps will be 0{L). On the 
other hand, the RMF with c = 0 is identical to the u ncorrelated 
rugge d landscape or the House-of-Cards model ( Kingmanl 
1978h and the mean number of steps of greedy walks is e - 1 


1.72 in the limit of infinite L ( On! 2003 ). Thus the first question 
to address is whether the greedy walk length remains finite for 
L oa when c > 0. 


F(0 


f 

v/—OO 


f(x) dx. 


( 2 ) 


The definition O should be interpreted in the Malthusian 
sense, where fitness values can be positive or negative. What 
Hegartv and MartinssonI (120141) proved is that for c > 0 in the 
limit L —> oo there is almost surely a directed path from the 
antipode Ca to the reference sequence Cr along which fitness 
is monotonically increasing, irrespective of the actual form of 
/(f), whereas for c = 0 such paths almost surely do not exist. 

Since we are interested in greedy walks, the statistics of the 
maximal value among groups of i.i.d. random variables will 
play an important role. For this reason we introduce the proba¬ 
bility Gk{x) that the largest value among L — k+ \{k > 1) i.i.d. 
f’s is smaller than x, which is 


Gk{x) 


=(£ 


f(y)dy 


L-k+\ 


= F{x) 


L-k^l 


with the corresponding density gk(x) 


gk{x) ^(L-k+ \)f{x)F{x) 


L-k 


(3) 


(4) 


The reason for considering L - k + 1 variables rather than k 

variables will become clear in Sec. |3] _ 

As has been note d previously Ipranke et al. . 2010[ 2011 


Neidhart et al. . 20141) . many properties of the RMF model take 
on a particularly simple form when the random variables f/ are 
drawn from the Gumbel distribution f(x) - , and we will 

adopt this choice in Sec. [3] For the Gumbel distribution, Gi(x) 
and gk(x) become 


Gk(x)= f gk(x) ^ 

\J —OO 


(5) 

(6) 


The Gumbel distribution is one of the three universal 
limiting distributions that arise in extreme value theory 
( de Haan and Ferreiral 2006h . and we will exploit this connec¬ 
tion in Sec. |4] where we study the properties of greedy adaptive 
walks for general choices of the distribution f{x). 


3.LL Exact solution 

To find the mean walk distance, we consider the probability 
/// that the walker takes at least I steps. For convenience, we 
denote the sequence at the Tth step by C/ with Co - Ca- The 
fitness of C; is the largest among the single mutant neighbors 
of C;-i. To find N/, we make the assumption that driCi) is a 
decreasing function in /, that is, the walker only takes steps in 
the direction towards the reference sequence Cr, referred to as 
the uphill direction in the following. This assumption is plau¬ 
sible if L » /, because a downhill step is possible only if the 
largest among the I random fitness components of the downhill 
neighbors exceeds the largest among the L — I random fitness 
components of the uphill neighbors by at least 2c . Obviously, 
for reasonably large L and a setting with rather short walks, this 
probability is negligible. The validity of this assumption will 
be ascertained later in a self-consistent way. Once the Hi have 
been determined, it follows that the greedy walk takes exactly I 
steps with probability Hi - Hi+\ and, in turn, the mean number 
of steps is 

L L 

<0 = 2 ((//,-//,+!) = 2 Hi, (7) 

/=! l=l 

where Hi+\ is set to 0. 

Let Ji{x) be the probability that the walker takes at least I 
steps with W(Ci) < -c(L - 1) + x and let ji(x) - ^Ji(x) (I - 
0,1,... ,L). Obviously, 


Hi = lim Ji{x) = 

A—»(X 1 



jiiy)dy. 


( 8 ) 


A recursion relation for /(x) can be derived immediately from 
the definition: 


jlix) = gl(x)Jl-i{x H- c) = gl(x) 



dyji-i(y) 


(9) 


with joix) - fix) = \ Since C/-i has L - I + 1 near¬ 

est neighbors in the uphill direction, we have considered gi(x) 
defined in Eq. (|4]i in the recursion relation. 

Introducing 


3. Gumbel-distributed random fitness component 

3.J. Greedy walks starting from the antipodal sequence 

Our analysis begins with the greedy walk starting from the 
antipodal sequence Ca- As mentioned before, the probability 


k-l 

ak = e^'“^ + - k + m + l)e^’”‘^ 

m=0 

L(l - e-^‘^) - -k 1 _ e-(^+i)c 

1 - e-" (1 - 


( 10 ) 
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which satisfies the recursion relation ak+i - (L-k) + e with 
ao = 1 , we can write 


Ji(x) = 


L! 

(L-iy. 


i' 


n 

Kk=0 


Gk 


( 11 ) 


for / > 1 , which can be proved straightforwardly by mathemat¬ 
ical induction. Thus, we get 


»'=n 


L/ — k 1 


k^l 


Gk 


( 12 ) 


as an exact expression for the distribution of walk length. Note 
that in the above derivation, the sign of c does not play any 
role, which implies that the case of negative c can be studied 
within the same scheme and Eq. (fT^ is valid for any c. By 
symmetry, a greedy walk with negative c can be interpreted as 
a walk starting from the reference sequence Cr (see Sec. l3.2l for 
further discussion). 

Since it does not appear feasible to extract simple analytic 
formulae from (fT2l i for arbitrary c and L, below we will present 
approximate calculations for certain limiting cases. Before 
delving into detail, we derive a simple upper bound on (/). 
Since ak>{L — k+\) + (L — k + 2)e“‘’ > (L- k+ 1)(1 + for 
k>2 and a\-L + e > L, 

//,<(l+e-T<'-'>, (13) 


which gives 

L oo 

</) = 2 /// < ^(1 + = 1 + eE (14) 

/=! /=! 


This upper bound clearly shows that {1}/L — > 0 as L — > oo for 
any c when is drawn from the Gumbel distribution. That is, 
it is highly unlikely that a greedy walk can follow an accessible 
path all the way to the referenc e state, although such paths exis t 
with probability 1 as shown bv iHegartv and Martinssonl(l2014h . 


3.1.2. The limit L — > oo at finite c 

Since Eq. (fOl l is valid for any L, Hi should be exponentially 
small for I ~ 0{L) once L » e^. This self-consistently affirms 
the validity of the assumption used in writing down Hi. In order 
to extract the L —» oo limit of Hi from (fT2l) , we use ak ~ 
L{\ - e“*‘^)/(l - e^‘^) to obtain 


/ 


H,=n 

k=\ 


1 

1 - ■ 


(15) 


This expression has an appealing interpret ation in terms of so- 
called ^-analogues ( Koekoek et all 201(lh . Recall that the q- 
analogue of a number « can be defined by [«]^ = 
which satisfies the basic property that lim^^i [n]^ = n. Defining 
the ^'-factorial as [«]^! = we see that Hi - ([/]c-r!)“* 

which reduces to Orr’s result Hi - (/!)“* in the limit c —> 0, 
—> 1. Moreover, the mean walk length is given by 


<l>-2/// = exp,-.(l)-l. 


(16) 


i=i 


where exp^(x) is the ^-exponential function, defined as 


exp, 


oo 


In fact an alternative derivation of ([15 1 can be set up in complete 
analog y to the original approach of 
(l2014l) 1. 


Orrl (l2003h [see iNeidharl 


We note for later reference that the expression (ITSl) has been 
derived previously for the probability that I random variables 
yk - Xk + ck are ascendingly ordered, yi < y 2 < ... < yi, 
where t he Xk are drawn ind ependently from a Gumbel distri¬ 
bution ( Eranke et al.L 201(lh . The reason for this coincidence 
will become clear below in Sec. 14. II 


3.1.3. Approximations for large and small c 

We next eyaluate (ITSl l for large and small c, respectiyely. If 
c » 1 , can be approximated as 


(1 

1 - 


(17) 


for / > 2 and Hi - 1. In the aboye approximation, we haye 
kept terms up to 0{e~^‘^) in the denominator. Hence the mean 
distance becomes 


(1) 


k=2 


( 1-0 


c\k-l 


e - e 


-2c 


-2c 


-2c 


e -v e 


(18) 


which is close to the upper bound of Eq. (O. 

Eor |c| <sc 1, we expand (l-e“‘^)/(l - up to (9(c^), which 

yields 


\-e-^ 
1 - 


1 Ik- 1 

-xp^—c 


k-- 1 
24 


-H (9(c 4) j . 


(19) 


Accordingly, Hi is approximated as 


l\Hi ^ exp - (2/3 + 3/2 - 5/)^ + ^(c") j 

, {1)2 9(/)4 + 32(/)3 2 

= 1 H- C -I- C 

4 288 

3(/)6 + 32(/)5 + 72(/)4- 24(/)2 3 ^, 4 , 

+- 777 ;;- c + 0{c ), 


( 20 ) 


where the Pochhammer symbol {l)k - 111(1 -k)\ has been used. 
Since 2”i(0tr/l! - e - dk.o, the mean distance becomes 


/,x_ M 1 41 2 83 

' 4 288 1152 


- 1, 


( 21 ) 


which reproduces the result by lOrrl (120031) when c = 0. The fact 
that the leading order correction is linear in c implies that walks 
starting at the references sequence (c < 0 ) are shorter than e-l 
when |c| is small. We will see below in Sec. l3.2l how this result 
generalizes to walks starting close to, but not at the reference 
sequence. 


4 
















































Figure i: Semi-logarithmic plot of the mean walk length (/) as a function of 
the strength c of the fitness gradient. Simulation data are shown together with 
the approximations Eqs. GD, i22\ . and the upper bound Eq. <14t . Inset: Linear 
plot of (/) vs. c together with Eq. (a) 


Figure 2: Mean walk length (/) is analyzed for large c by plotting 1 - 
vs. Le~^ for L = 2^, 2^,..., 2^^. All data points nicely collapse onto a single 
curve which is the scaling function A(x) in Eq. E). The asymptotic behavior 
A(x) « x/2 for a: 0 is also confirmed. 


If c < 0 and |c| » I, {L-k+ l)/a^ ^ and Hi = 

Hence, to keep terms up to order it is 

enough to consider only Hi + H 2 , which gives 


for k > 2 and L/ai 1 - e ‘^/L, where we have kept terms up 
to 0(e^‘^). Hence 


</) = 1 + ( 22 ) 


Note that even if |c| —> 00 , the walker takes at least one step. 
This is because we take L —» 00 limit before |c| ^ 00 limit and 
under this order of limits the probability that the reference se¬ 
quence is a local maximum is zero for any c. For later purposes 
we recall that the probability for a sequence at distance d from 
the reference sequence to be a local fitness maximum is given 
by ( Neidhart et al. . 20141) 


V)- 


1 


1 + de‘^ + (T — d)e 


(23) 


which vanishes when the limit L ^ 00 is taken for d - 0 and 
fixed c. Thus the walker needs to take at least one step to reach 
a maximum. 

In Fig. [T] we compare ( 1 ) obtained from simulations of 10* 
independent realization with sequence length L = 2*° to the 
approximations Eqs. (fTST l. (|2T]i, and (l22l i together with the up¬ 
per bound of Eq. (fT4]i . The simulation method is explained in 
As a rule of thumb, the large |c| approximations 
|c| > 1 and the approximation for |c| « 1 becomes 
accurate for |c| < 1. 


Appendix A 
work well for 


3.1.4. The limit c —» 00 at finite L 

Eor finite L, it is clear that the mean walk length should ap¬ 
proach L as c ^ 00 . This limit can be attained when c is much 
larger than the (typical) largest value among L i.i.d. random 
variables. Eor the Gumbel case, this corresponds to InL «c c 
or Le «c 1. To find an approximate solution of (Z) under this 
condition, we go back to Eq. (fT2l i and expand (L - k + \)/ak in 
terms of as 


T — k + \ 
ak 


1-e-^ 



L-k+l 


(24) 


H, 




k=2 


1 


L, — k -h 1 


k=l 


.1 2 r^TT- 

which gives 

{l)«L- e- 


(25) 


/ J I 

L(L-l)^ 




I=l k=l 


, Zy + 1 

-L\l-^e- 


(26) 


As anticipated, appears as an expansion parameter and (/) 
approches L as c ^ 00 . Thus, it is quite plausible to assume a 
scaling form such that 


l-^=A(LO, (27) 

where A(.r) is a scaling function with asymptotic behavior 
A(.x) ^ x/2 for sufficiently small x. That is, if we plot 1 - {l)/L 
as a function of Le~‘^ for sufficiently large L, the data obtained 
for different combinations of L and c should collapse onto a 
single curve. To confirm this, we performed simulations for L 
ranging from 2® to 2'^. Eigure|2 which is the result of 10* in¬ 
dependent realizations for each data point indeed confirms the 
existence of such a scaling function. 

3.2. Greedy walks with arbitrary starting point 

In this section, we relax the assumption that the walk always 
starts at the antipodal sequence Ca and calculate the mean num¬ 
ber of steps in the case that the initial genotype has Hamming 
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distance do from the reference sequence C,. Note that the case 
treated in the previous section correspond Xodo - L and the case 
with c < 0 in the previous section can be understood as a greedy 
walk starting at (fo = 0 with positive c. We consider the limit 
L, do —> oo with a = do/L kept finite. Since the RMF landscape 
is symmetric under the simultaneous transformations c i-> -c 
and do ^ L — do, we can set c to be non-negative without loss 
of generality. 


3.2.1. Exact asymptotic solution 

Unlike the previous section, the initial genotype has 0{L) 
neighbors in both the uphill and downhill directions, and we 
cannot exclude the possibility that the walker takes a downhill 
step. Assume that the walker arrives at the sequence C/ at the 
/-th step and driCi) - d. Note that do — d needs not be the same 
as 1. Now, we introduce the function 


q(y,cr) 


djFiyY) 

dv 

d{F(yf-‘‘ 

dy 


{F{y + 2c)f-^ 
HF(y-2c))‘‘ 


if cr = H-1, 
if cr = -1, 


(28) 


which is interpreted as the probability density that the largest 
fitness among the uphill (downhill) neighbors has the random 
contribution y and all downhill (uphill) neighbors have smaller 
fitness when cr = 1 (-1). 

As in Sec. 13.11 the probability of taking at least I steps is 
denoted by ///. Since the walker may move in the uphill or 
downhill direction with non-negligible probability, we have to 
take into account all possible combinations of directions. If 
CT/ 6 {±1) is the change in the distance from the antipodal se¬ 
quence Cfl at the /-th step, then the change in d over a path 
is stored in an ordered set (cr); = (cri, cr 2 ,..., cr;). Defining 
Ml = CT;, the Hamming distance from Cr after / steps is 
d - do - Ml. We assume (and will subsequently verify) that 
the probability that the walker takes 0{L) steps is exponentially 
small for large L. Accordingly, the scaled distance d/L and 
therefore the function q(y, cr) in ( l29l l do not change significantly 
during the walk. Within this assumption, we can approximate 
(l28l l in the form 


qiy, cr) = ipsa-Qix + crc), (29) 

where fS - ae‘^ + (1 - Q')e^‘^, = ae'^Ifi, = 1 - ii, and 

Q(x) - exp(-v - e^^LfS), which is independent of /. 

Let Ji(x, {cr};) be the probability that a walk has moved ac¬ 
cording to {cr}; and the fitness of the sequence at the /th step is 
smaller than-c(c/o - M;)-HX. With y;(v,{cr};) = ^Ji(x,{a-)i) we 
then have, in analogy to ([8]l, 

/^OO 

7;(oo, {o-};) = ^ I ji(x, {o-]i)dx, (30) 

Itr), {a-}, 

where the summation is over all possible 2^ combinations of 
{cr};. Similar to (|9]l one can construct a recursion relation for ji, 
which reads 


ji(x, {cr};) = q(x, cri)Ji^i(x + cric, {cr};_i) 

'-'.r+o'/c 


r 

q(x, o-i) 

*J—Ol 


ji-i(y,{o-}i-i)dy. (31) 


For / = 1, 


X A'-}-(TiC 

f(y)dy 

oo 

= q(x, cri)F(x + cric) ^ q(x, cr;), (32) 

where we have approximated F a; 1 because the relevant fitness 
values reside far in the tail of the distribution when L is large. 

If we neglect the effect of the change in d on q(x, cri) as as¬ 
sumed above, we get 


Ji(^,{cr]i)^(LI3)' 

2 „ 

xn7 

k=i 

■ni 

7,_1 A 


( ‘ \ r 

FI f 

\k=l / 


dyiQiyi + cric)x 


■Yt+CTtC 


Q(yic-i + o-k^ic)dyk-i 


Ar=l 


+ ZL'i exp(-cM„,) ’ 


(33) 


where in the second line signifies an index-ordered product 
in descending order of k, which should be interpreted as 1 if 
/ = 1. The solvability of the nested chain of integrals in (l33]) 


is specific to the Gumbel distribution; see Appendix B From 


Eqs. (l30l) and (l33l l. we arrive at our central result 

//; = y n-- 

which reduces to Eq. (USll when a - 1. 


(34) 


3.2.2. Dependence of the walk length on a and c 

Since exp(-mc) < exp(-cMm) < exp(mc) and s; H- = 1, 
the expression (l34l i is bounded from below and above by its 
values for a - 0 and a - 1, respectively 


Hl\a=0 - 


1 


„kc 


< Hi < 


k=l 


n 

^=1 


-kc 


= (35) 


In fact, using ^ 0 and exp(-c-H A) < exp(c-i-A) 

for any real A, one can easily see that ^Hi > 0, where the 
equality holds only when c = 0. That is. Hi is an increasing 
function of a, and correspondingly the mean walk length (|7]) 
decreases monotonically as the position of the starting point 
approaches the reference sequence, which is easily conceivable. 

By contrast, the dependence of the mean walk length on c is 
more complex. We have seen above in Sec. 13. 1.31 that the walk 
length decreases with increasing c when the walk starts at the 
reference sequence (a - 0), and we will now show that such an 
initial decrease occurs whenever a < j. On the other hand, for 
very large c the walk length must approach aL for any a > 0, 
and we must therefore expect a non-monotonic dependence on 
c for 0 < g < 7 S uch a behavior was already reported by 
Neidhart et al.l (12014t) on the basis of numerical simulations. 
When c <si 1, we can approximate Hi up to O(c^) as 


(see Appendix B for the derivation) 


, , 702 , ,2 29(04 + 32(/)3 , ,, ,2^ 27(03 ,,,, 

PHi = 1 + + b c --+ (1 - b )c —, (36) 


6 


288 





















Figure 3: The mean walk length (/) is plotted as a function of c for different 
starting points a = 2~^, 2~^, and 2“^ (from top to bottom) with comparison to 
the expansion in Eq. (22). The sequence length is L = 2^^ and the number of 
independent runs for each data point is 10^. 


Figure 4: Semi-logarithmic plots of {/) - 1 vs. c for o' = 2 ^^,2 2 and 

2-40 (fj-Qjjj ^-gp to bottom) with comparison to Eq. OD- The sequence length is 
L = 2^^ and the number of independent runs for each data point is 10^. 


where 6 - si - s-i - (ae^ - (1 - o:)e ^)lp. Accordingly, the 
mean number of steps becomes 


<0 


, 5 41(5^ 

1 H —ec H- 

4 288 




1 + 


2a 123 +596q'( 1-a) , 

- ec - ec , 

4 864 


(37) 


where we have also expanded d up to O(c^). Hence {/) is an 
increasing function of c for a > 5 when c is small enough, 
while for a < ^ the mean walk length initially decreases with 
c for small c. Since the walk length is known to increase at 
large c, it follows that there must be a turning point which, in 
the quadratic approximation (I771 i. is given by 


^turn 


108(1 -2 q') 

123 + 596a(l - a)' 


(38) 


A comparison of Eq. dJTl i with simulations is shown in Fig. [2 
which illustrates the accuracy of the analytic expression (iJTl i for 
small c. As predicted, it also confirms the absence of a turning 
point for a > j. As a decreases, the position of the turning 
point found in the simulations moves to larger c, which makes 
the small c approximation inaccurate for precisely pinpointing 

Qurn ■ 

From Fig. [3] the position Ctum of the turning point seems to 
diverge as a —» 0. When a - 0, the mean walk length decreases 
as (1) X 1 + e~^ for sufficiently large c as shown in Sec. 13.1.31 
When a is very small, (1) should therefore first decrease as 1 + 
e^'’, but eventually increase with c for sufficiently large c. As in 
the case of c < 0 and |c| » 1 in Sec. 13.1.31 when (/) - 1 <s 1 
this quantity is expected to be well approximated by 


</> - 1 ^ H 2 


S\ S-i 

1 + e^‘^ 1 + 


1 + se^'^ 

(1 + e'^)(l + ee^'^y 


(39) 


where e = a/(l - a). In Fig.|4] we compare simulation results 
for small a (a < 2 '°) with Eq. ( l39l l. which shows an excellent 


agreement as long as (/) - 1 < 0.1. Hence the turning point can 
be found by investigating the minimum of H 2 , which gives 


t^turn ~ ^ ltt(2^?). 


(40) 


where we have only kept the leading order of e « 1. Note that 
Ctum indeed diverges as tr —> 0. When c < Ctum. the mean walk 
length is well approximated by 1 + e which is the result for 
a = 0 with c > 1 . 

To put these results into perspective and provide an intuitive 
explanation of the observed non-monotonic behavior as a func¬ 
tion of c, it is instructive to compare the mean walk length to 
the density of local fitness maxima. Since the walk is trapped at 
local maxima, one genera lly expects an inverse relationship be¬ 
tween the two quantities ( WeinbergeR 1991 ; Nowak and Krusl 
2015h . According to (|2^ . the density of local fitness maxima at 
distance d from the reference sequence becomes == l/(/3L) 
in the limit when L, c/ —> 00 at fixed a - d/L, where we recall 
that /3 - ae‘^ + (1 - a)e~‘^. It is straightforward to check that p™” 
decreases monotonically with increasing a but displays a max¬ 
imum as a function of c for a < ^ The maximum is located 
at Ctum = - 5 111 ® which is similar to (l40l) and also diverges for 
a —> 0. We may thus conclude that, at least qualitatively, the 
behavior of the greedy walk length reflects that of the density 
of local maxima. 


4. General distribution of the random fitness component 

4.1. Reformulation of the problem 

Up to now, we have presented a detailed analysis of greedy 
adaptive walks for the case of Gumbel-distributed random fit¬ 
ness components. In this section, we will generalize our find¬ 
ings to arbitrary probability distribution functions F{y), focus¬ 
ing on the limit L —» 00 . As in Sec. 13.21 the initial geno¬ 
type from which the walker starts has the Hamming distance 
c/o from the reference sequence and we take do,L^ 00 at fixed 
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a - dolL. Under these conditions the walker takes both uphill 
and downhill steps. As long as the number of steps taken is 
much smaller than L, the walk dynamics can be formulated in 
terms of the following game: 

At each round n {n = 1,2,...), one generates two random 
variables Y„ and F_„, where Y„ is drawn from the distribution 
F(y)^ and from Then choose the larger one 

between Y„+c and Y-„-c. Assuming that the larger one is To-„h + 
cr„c where cr„ can be either 1 or - 1 , this number is compared to 
X„-i, with Xq = -oo. If X„-i is larger than „ + cr„c the game 
is over. Otherwise, we set X,, = Ya-^^„ and go to the next round. 
Then the mean number of steps in the greedy walk is the same 
as the mean number of rounds up to the end of the game. 

For convenience, we introduce an event 

En(cr) = {Ya-n + CTC > Y.a-n - CTC & Y^-n + CTC > ( 41 ) 

where X„-i is defined as above. With this notation, we can 
write down the probability that the game persists at least up to I 
rounds as 

Hi = Prob(£'i(cri) n £2(0-2) n ■ ■ ■ n E 1(0-1)), ( 42 ) 

lo-l; 


exists and is non-degenerate. 

In terms of the transformed random variables, the event 
En(o-) can be recast as 

Efi(o-) — "I" rrc > o-c & o-c > Xy^—i], (46) 

where c = c/az, and = Zo-, „. In the following we apply this 
approach to the three classes of extreme value distributions. 

Gumbel class. As a representative of the Gumbel class of ex¬ 
treme value theory we choose the Weibull distribution F(y) - 
1 - e^-'". Setting 

Yk = (lnL)P®(l -H = (InL)P® -i--(47) 

^ \ eiwLI ^ ’ 6»(lnL)i-F» ^ ^ 

the limit (l45l l becomes the Gumbel distribution 

Xc(z) = (48) 

with support -oo < z < oo, as can be seen using the approxima¬ 
tion};® = lnL(l-i-z/(01nL))® lnL-HzH-o(l/lnL). Accordingly, 

c = 6 'c(lnL)‘-P®. (49) 


where the summation is over all possible sequences of cr’s of 
length /. 

For a - 1 all steps are in the uphill direction and (l42li reduces 
to a single term with cri = 0-2 = • ■ - cr/ = 1 , which can be 
written as 


Hi = Prob(Fi -H c < F 2 + 2c < • ■ ■ < F/ H- cl). 


(43) 


that is, the probability that the sequence of random variables 
Y„ + cn is ascendin gly ordered. This quantity was studied by 
Franke et al. ( 201(lh who showed that it is given by (flSl l when 


the F„’s are drawn from the Gumbel distribution. To see why 
this result applies in the present context, we note that the distri¬ 
bution function of the maximum among L i.i.d. Gumbel random 
variables is given by 


F(y)^ = exp[-Le^"] = exp[-e“^''"^>] ^ F(y - InL), (44) 

which is identical to the original distribution up to an overall 
shift that doesn’t affect the ordering probability (l43T l. 


4.2. Extreme value classes 


In order to analyze the problem for general choices of the 
distribution function F(y), we exploit the fact that F(y)^ and 
F(y)^P~“^ converge to one of the extreme value distributions 
when the limit L —> 00 is combi ned with a suitable rescaling 
of y dde Haan and Ferreira! 2006h . Specifically, we introduce 
random variables such that Ft = alZa + bi, where k is an 
integer, a^ and b^ are parameters that depend on L but not on k, 
and ai > 0. The parameters a^ and bi have to be chosen such 
that the distribution of Za has a well defined limit as L —» 00 , 
that is, such that 


For the case of an exponential distribution (6 - 1) it follows 
that c - c, and we conclude that the results derived in Sec.|3]for 
Gumbel-distributed random fitness components in fact apply 
asymptotically to all distributions with exponential tails. On 
the other hand, when the tail of the distribution is fatter (0 < 1 ) 
or thinner (0 > 1 ) than exponential, c asymptotically scales to 
zero or infinity, respectively, when the limit L —> 00 is taken at 
fixed c. This implies that greedy adaptive walks on the RMF 
landscape behave asymptotically like those on an uncorrelated 
landscape in the first case, their length approaching (1) - e - 1, 
whereas in the second case the walks move all the way to the 
reference sequence and (Z) ^ aL. Because of the logarithmic 
dependence of c on L, corrections to this asymptotic behavior 
are however expected to be important, and can be obtained from 
the results of Sec. [3]by replacing c with c. 

Frechet class. This class comprises distributions with a power 
law tail and can be represented by F(y) - 1 with y > 1 and 
p > 0. Choosing a^ - and bi - 0, the limit (l45t becomes 

/:f(z)= lim(l-^) (50) 

L-^oo \ j 

with the support z > 0. Accordingly, c = Assuming 

that c remains finite when taking the L —» 00 limit, c approaches 
zero and the problem becomes identical to the greedy walk on 
an uncorrelated landscape. 

Weibull class. Lastly, we consider distributions with bounded 
support, as represented by the distribution function F(y) = 1 - 
(1 - yY with y 6 [0,1]. Setting a^ = jjjg 

limiting distribution is 


K(z) - lim F(aiz + bi)^ 

L—>cc 


Kw(z) = e ^ 


(45) 


(51) 















with the support z < 0. Hence, in this case c = For finite 

c, c is effectively infinite so that Hi - 1 and (/) » aL. 


To summarize the results of this section, we have shown that 
it is only for distributions with exponential tails that the mean 
greedy walk length displays a non-trivial dependence on c, and 
in this case the results of Sec.[3carry over without modification. 
In all other cases a non-trivial asymptotic behavior requires that 
the strength of the fitness gradient c is scaled with L in such a 
way that c has a finite limit for L oa. 

For the non-Gumbel extreme value classes characterized by 
the limiting distributions (fSOl l and (ISTT i a closed-form solution 
analogous to that obtained in Sec. |3] for the Gumbel class ap¬ 
pears to be out of reach, with the exception of the Weibull class 
with V = 1, where the explicit formula 


(0 = 


i; 

\k=0 


(-ir 

k\ 


\ 


-k(k-l)c/2 


-1 


- 1 


(52) 


can be derived for a = 1 (see |Appendix C| l. In the general case 
we therefore resort to approximations that are valid for small 
and large c, respectively. Apart from their intrinsic interest, 
these results can be used to compute corrections to the asymp¬ 
totic walk length when c and L are both finite. Throughout we 
assume a general limiting distribution function Kiz) with the 
corresponding probability density/ji(:(x) = 


4.3. Small c approximation 

We begin with the case a = 1, where previous results for the 
or dering probability (|43l l can be exploited. Indeed, the results 
of Franke et al. ( 2010l) imnlv that 


1 

Hi^- + - 

n (/ 


~ /-»0( 


dxfKixf 


• 0{C^) 


(53) 


for / > 2 whenever the integral on the right hand side exists 
(note that Hi - \ independent of c). Summing over I it thus 
follows that 


</) = e - 1 


■f 


dxfKixf + oif). 


(54) 


Although the case of general a is more complex [as can be 
seen by comparing the expressions (l42l i and (l43l l1. the extensive 
calculations presented in Appendix D and [Appendix E yield a 
simple result which amounts to replacing c by {2a— l)c in (15^ . 
Evaluating the integral over fxixf for the limiting distributions 
(l48l l and (fSOl l. we thus obtain 


</)g = e - 1 h- ^ ec, (55) 

</)f = e - 1 h- {2a - \)ecp2-^-'^l‘^Y{2 + (56) 


to leading order in c. Note that the result for the Gumbel class 
is consistent with Eq. (ITTT i. 

Eor the Weibull class, the integral on the right hand side of 
(O exists only for v > 5 , and a more careful analysis is re¬ 
quired to find the leading correction in c. Detailed calculations 



c 


Figure 5: Double logarithmic plot of 2((/) - ^ + 1)/(^(2 q' - 1)) vs. c for the 
Weibull class with v = 1 and various values of the scaled initial distance a. The 
data sets for different a collapse into the line y = x as predicted by Eq. Oi). 


are found in Appendix E The final result for the mean greedy 
walk length in this case reads. 


^2-2+i/vp 2 



V > 


2 ’ 


{l)w - e + I 
e{2a- 1 ) 


c 

4 


ln(e^’' 'c). 



(57) 


F(l-2y)F(y+l) .2, ; 

2 r(l-y) ^ 2 


where y ^ 0.5772 is the Euler-Mascheroni constant. 

In Eig.|5| we confirm the validity of Eq. (fSTl i for Kw{x) - e’^ 
(x < 0, y = 1) by simulations. Note that the result for y = 1 and 
a - 1 can also be obtained by expanding the exact expression 
(15^ up to (9(c). In Eig. j^we show simulations of Eq. (i46T l for 
the Weibull class with various values of y and a - 1. Each data 
symbol in Eig. [ 6 ] is the result of 10'* independent runs. The 
predicted Eq. (iSTl i is in good agreement with the simulations. 


4.4. Large c approximation 

When c is very large, the walker takes uphill steps toward the 
reference state with probability close to 1 as long as a 0. (If 
a = 0 and c is very large, (Z) a; 1 H- H 2 as for large negative c 
in Sec. 13.21 ) In this case, we can neglect the effect of downhill 
steps and the problem is r educed to the ordering problem stud¬ 
ied by Eranke et al. ( 2010ll with random variables drawn from a 
distribution K{zY. If the support of Kiz) is unbounded from the 
above as in the Gumbel and Erechet classes, the walker stops at 
the Z’th step when Z/ happens to be larger than c. Thus (Z) can 
be estimated from the relation 1 - /r(c)“ = 1/(Z). Eor Kq, we 
get (Z) ~ e‘^ja - exp[ 6 >c(lnL)*"*^®]/Qr. Eor/Tf, we get 


a aL 


(58) 


If the support of K{z) is bounded from above but unbounded 
from below as in the Weibull class, the walker should stop at the 
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c 


Figure 6: Double logarithmic plot of </) - e + 1 vs. c for the Weibull class with 
V = 2,1, 4, I, and H ere a is set to 1. The straight lines close to each data 
set are given by Eq. i l57t with the coiTesponding values of v. 



Figure 7: Mean walk length for uniformly distributed random fitness compo¬ 
nents with sequence length L = 2*®, 2*^, 2®®, 2^^, and 2®® (from bottom to 
top) and antipodal starting point {a = 1). As predicted by theory, in this case 
the walk length is a function of cL. The number of runs is between 10® (for 
L = 2®®) and 2 X 10'* (for L = 2*®). 


(/ - l)th Step when Z; happens to be smaller than -c. Thus (/) 
can be estimated from {l)K{-cy = 1, and using the expression 
for Kw, we get {/) ~ As an example, we present 

simulation results for the uniform distribution [F(x) = x] with 
a = 1 in Fig. |7] Note that the leading behavior of Eq. (l52l i 
for large c is 2e'^ - 1, which is consistent with the approximate 
estimate as well as with the simulation results in Fig.|7] 


5. Discussion and conclusion 


Adaptive walks arise as limiting cases from standard pop¬ 
ulation genetic models and represent an important paradigm 
in the theory of adaptation that has generated a number of 
non-trivial and experimentally testable predictions ( Onl 2005 


Schoustra et al.L 2009t Seetharaman and .lainL 2014 ). In partic¬ 


ular, the greedy adaptive walk considered in the present arti¬ 
cle is of biological interest for two reasons. First, it can be 


viewed as an approximate description of adaptation in a situ¬ 
ation where the supply of single beneficial mutations is high, 
such that all mutants are generated simultaneously and the mu¬ 
tation of largest effect takes over by selection. Second, the 
greedy search strategy is arguably one that locates local fitness 
maxima in the smallest possible number of steps. Greedy walks 
therefore provide important insights into the geometry of h igh¬ 
dimen sional random fitness landscapes, where, as shown by jOn 
(120031) for the uncorrelated case, fitness peaks are found within 
2 mutational steps on average. 

Here we have generalized the analysis of OrrI ( 2003h to the 
class of RMF models, where a fitness gradient of strength c 
is introduced to smoothen the landscape and to induce corre¬ 
lations between genotypes. Fitness correlations are generally 
expected to increase the length of adaptive walks, and we show 
that this is true in most but not all situations. 

Importantly, we find that the effect of the fitness gradient on 
the length of greedy walks depends crucially on the tail proper¬ 
ties of the distribution underlying the random fitness component 
of the RMF landscape, which can be classified in terms of ex¬ 
treme value theory (EVT). The results of our analysis in Sec.|4] 
imply that greedy walks on the RMF landscape are asymptoti¬ 
cally as short as in the uncorrelated case when the distribution 
of the random fitness contribution is heavy tailed (Frechet or 
Gumbel with tail fatter than exponential) but become very long, 
with length equal to the distance to the reference sequence, 
when the distribution is light tailed (Weibull or Gumbel with 
tail thinner than exponential). Analogous results that single out 
fitness distributions with exponential tails with regard to struc¬ 
tural properties of the RMF landscape (such as the number of 
local fitness peaks) and the length of random ad aptive walks on 


this landscape have be en reported previously (iNeidhart et al 


l2ni4HParketal.Ll2ni.5h. 


The prime representative of exponentially-tailed fitness dis¬ 
tributions in the Gumbel class of EVT is the Gumbel distri¬ 
bution itself. For this case detailed results for the distribution 
of the greedy walk length were obtained in Sec. [3j for finite 
L and antipodal starting point as well as for arbitrary starting 
points and L ^ oo. Perhaps the most surprising result of our 
analysis is the finding that the mean walk length depends non- 
monotonically on the strength of the fitness gradient c, when 
the walk starts closer than at distance L/2 from the reference 
sequence (i.e., the scaled dista nce is g < ^). This beh avior was 
first observed numerically by INeidhart et al.l (l2014l) . and we 
have argued that it can be related to a similar non-monotonicity 
in the local density of fitness peaks. 

Although the analysis in Sec. 13.21 is restricted to Gumbel- 
distributed fitnesses, the fact that the leading order correction 
to the uncorrelated walk length derived in Sec.|4]is universally 
proportional to 2a - 1, and hence changes sign at a = j, indi¬ 
cates that the phenomenon is robust and does not depend on the 
distribution of the random fitness component. Notably, within 
the EVT of adaptation it is usually assu med that the fitness of 
the wild typ e is high in absolute terms ( Gillesni j. 1984; On . 
2 OO 2 I 20051) . In the context of the RMF model this implies that 


the adaptive walk starts rather close to the reference sequence, 
that is, at small a, where the minimum in the walk length as a 
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function of c is particularly pronounced (see Sec. 13.21) . 

The existence of this minimum appears to contradict Orr’s 
conjecture that the c — 0 value (1) - e - 1 consti tutes a gen - 
eral lower bound on the length of adaptive walks dOni 120031) . 
However, in formulating his conjecture Orr demanded that the 
walk starts at a randomly chosen point in sequence space, which 
implies that our result in Eq. (|J71) should be averaged over a. 
Since the probability of choosing a is symmetric under the 
transformation a i-> I - a, the average of 2a - 1 is zero and 
that of a(l - a) is positive. It follows that the averaged walk 
length cannot be smaller than e - 1. Similarly, in Rosenberg’s 
rehnement of Orr’s conjecture it is postulated that the htness 
values in the landscape are identically distributed, and that the 
htness correlation s between neighboring genotypes are positive 
(Rosenberg 20051 Sec. 5). Whereas the l atter statement applies 
to the RMF model ( Neidhart et ah . 2014I) . the former does not. 
We concl ude, theref ore, that the seeming violation of the con¬ 
jecture of On] ( 2003 ) must be attributed to the anisotropy of the 
RMF landscape. 

An important aspect of adaptation that we have not addressed 
in this work concerns the htness level reached by the population 
at the end of the adaptive walk. In a recent comparative study of 
different types of adaptive walks on Kauffman’s NK-landscape, 
it was found that greedy walks reach higher htness levels than 
random adaptive walks on correlated landscapes, but the rank¬ 
ing amon g the walk types may ch ange in the presence of cor¬ 
relations (iNowak and Krugl 2015 ). The results presented here 
suggest that a detailed analysis of the interplay between htness 
correlations and the efhciency of different modes of adaptation 
may be feasible within the framework of the RMF model, and 
we hope to report results along these lines in the future. 
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If xi (x 2 ) is the largest random number among the uphill 
(downhill) neighbors and if either xi -i- c or X 2 - c is larger than 
the walker takes one further step. The actual direction will 
be determined by checking whether xi -i- 2c > X 2 or not. If ^ is 
the largest, the walker stops. 

If k is very large, we sometimes use the following approxi¬ 
mation 


yl/k ^ ^Iny/k ^ j ^ 


Iny 


-H 


1 ^\ 

2k I 


(A.l) 


As a rule of thumb, for k > 50 000, the above approximation 
gives a more accurate value than the direct power calculation 
when we perform numerics with double precision (~ 10 '^). 
Note that when k is very large, it is better to use 1 - x when 
deciding the fate of the walk, otherwise there could be round¬ 
off errors which give x = 1. 


Appendix B. Derivation of Eq. (O 


We hrst derive Eq. ([33]). The integral over yi is readily cal¬ 
culated as 

X y2+cr2C 

exp(-yi - cTic - Lpe~°'"^e^^')dy\ - 

oo 

exp(-Ly6eA°-i+‘^2ke-f2). (B.l) 

Using the above equation, we can calculate the integral overy 2 
as 


Lp 


pys+o-sc 

I exp(-y2 - cr 2 C - LjSe“°’^^(l -I- 

kJ —oo 


1 


■ exp -t , (B.2) 


1 -I- g o-ic 

from which one can easily guess and prove that 

2 


(W 'f]' I 

k=r, 


•yk+a-kC 


Qiyk-i + o-k-ic)dyk-i 


n 


1 


=f 1 + e 




exp 






. (B.3) 


Appendix A. Simulation Method 

Since we only need the largest value among a certain number 
of i.i.d. random variables with a known distribution function, 
only two random numbers are necessary to check if the walker 
can take a further step (see also Sec. EtJ. To be concrete, let us 
assume that the walker is at C with htness —dc+^ where d is the 
Hamming distance of C from the reference sequence. Since the 
cumulative distribution of the largest random number among k 
variables is F(x)^, x can be generated by x = where y 

is a uniformly distributed random number (0 < y < 1). For the 
Gumbel distribution, x = Ink - ln(- Iny) and for the uniform 
distribution, x = y*A. 


The hnal integral over y; gives Eq. (1331 ). 

For small c, we hrst expand the denominator in Eq. (1331) up 
to 0{c^), which is 


k-l 




-cM,„ 


= k 


! k-\ 2 k-\ \ 

1 - - y M,„ — y Ml 

k Zj 2k Zj 

m=\ m=l / 


(B.4) 


Then we expand the terms in H[ up to (9(Z) to get 
I'Hi = ^ 5 ({o-)) |l -H cyi H- y (72 + 73 - 74 )] , 


{o-} 


(B.5) 
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where S ({cr)) = HLi and 


n 


/ ^ k—I m 

k=l m=l n=l 


72 


73 


74 


I * f ^ 

Zh ZZ- 

A=1 V/n=l n=\ 7 

(I -^k-l m 


ZjZZ- 

k=\ m=\ «=1 

/ A—1 / m 

ziz z- 


A=1 m=lV«=l / 

The summations over cr in Eq. (Ib31 i have two forms 


^ S ({cr))o-,„ = (Si - «-!) I~[ ^ 

|o-| k+m cri, 

^ S ({cr))o-„, 0 -„ = (5^ + 6mn(i “ 5^) 


(B.6) 

(B.7) 

(B.8) 

(B.9) 

(B.IO) 

(B.ll) 


{cr] 


where d = and we have used ii + i_i = 1 and crl - 1. 

Thus, we get 




/ j A-1 


- = ZiZZZ s ({o-))cr„ ^ 

A=1 m=l n=\ cr A=1 m=l 

Kl-\). 


-5, 


I j A-1 m A—1 r 


(B.12) 


72 


=ZfZZZZZ s ({o-))o-jcr„ 


A=1 m=l n=l r=l .9=1 {cr} 

I A-1 m A—1 r 

= ZpZZZZ('''+"‘-■'■)) 


} 2 A-1 m A—1 

¥ 

A=1 m=l n=l r=l 9=1 
(^- 1 ) 


k=l k=l m=l ;•=! 


= (5 


, 2 ;(/- 1 )( 2 /- 1 ) 
24 


(1Z 


it=i 


(;t- l)(2;t- 1) 
6 ^ 


= Z/(;_1)(2Z-1)+1-^(Z2_2Z + Har[/]), (B.13) 

where Har[/] = ^Li ^ and 

/ I . Ai-1 m A 2 —1 r 

«= Z Z jii; Z Z Z Z 

Ai = 1A2=1 w=l «=1 r=l 9=1 




‘ ' k-¥ 


z 

V <:=1 


I I j yti^l 


= 6- 


.2(1(1 -W 
16 


+(1 - Z Z ^ Z Z 

/ ki = \ k2 = \ ‘ ,.^1 


Ai = l A2=1 


6K 


(B.14) 


where K - max(A:i,A: 2 ) and k - min(A:i, A: 2 ). The summation in 
the last line of the above equation is 

^ (A:- 1)(2A:-1) {k-\){3K-k-l) 

Zj Tk 


A=1 


K=2 A=1 


6K 


/(14/^-33/ + 37) 1,, 

-i08- 


(B.15) 


And finally 


/ A-1 m m 


74 


zizzzz s ({o-})o-„0-r 

A=1 m=l rt=l /■=! {cr} 

/ 1 A-1 m m 

ZtZZZ('*'-"*'<i-«')) 

A=1 m=l rt=l r=l 

ZI Z('*""" 

A=1 m=l 

1(1-1) .1(1-W-2) 

+ d - 


(B. 16 ) 
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Combining these results, we arrive at Eq. (l36l l. 

Appendix C. Derivation of Eq. (l52l) 

This appendix calculates the mean walk length for the case 
of K(x) = (x < 0) with cr = 1. Eor brevity, we will denote 
the mean walk length by C and we drop the tilde in c in this ap¬ 
pendix. Eor completeness, we write the probability //; of taking 
at least I steps 


77; = I Mx)dy, 

*J—CX3 

where Ji(x) satisfies the recursion relation 

X X+C 

Ji-i(x), 

oo 


(C.l) 


(C.2) 


with Ji(x) - e^. Since the support of ii(x) is v < 0, Eq. dOll 
for -c < X should be interpreted as 


ii(x) = 


Let 


ip(x) 


X X ^ 

Z^' 

“ ;=i 


ji(y)dy. 


(C.3) 


(C.4) 


which is related to the mean walk distance hy C - Y^i Hi - <p(0). 
By summing both sides of Eq. (1C.21) from I - 2 to infinity, we 
get the difference-differential equation 


d(p(x) 

dx 


= e^(\ -I- ip(x + c)). 


(C.5) 


where (p(x) with x > 0 should be interpreted as €. Thus, for 
X > -c, we have 


ip(x)-ip(-0^(\+€)(e^-e-^). 


(C. 6 ) 
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Using ^(0) = {, we get (p{—c) = (1 + - 1 which, in turn, 

gives, for x > —c. 


tp(x) = {l+{)e^-l. 


(C.7) 


Having determined (p(x) for x > -c, we can find ip{x) for -2c < 
X < —c and so on. After a few attempts, we make an ansatz, for 
—nc < X < —in — l)c. 


ip{x) = (1 + ^) y — exp(kx + k{k - l)c/2) - 1, (C. 8 ) 

k\ 


k=0 


which satisfies Eq. (IC.Sl l. From the continuity of ^ at x = -nc, 
that is, ip{—nc + 0 ) = ip(-nc - 0 ), we get a recursion relation for 
the Un as 


Cln+l — ^ 


_ -«(n+l)c/2 


! ^k{k+\)cl2 ^k(k~-\)ct2 


z 

k=0 


Ok 


(n-k)\ {n-k+l)\ 


, (C.9) 


which is identical to 


n-l , 

(a„+i - = - V --— {ak+\ - at) 


.k(k+\)cl2 


1 


(«+ 1 )! 


(C.IO) 


with ao = 1 and ai - 0. If we define — {k + l)!(a,t+i 
we get 


or 


/:=0 ' ' 


z(';:;V‘=-'- 


/t=0 


(C.ll) 


(C.12) 


Since do - -1 and = -1, we conclude that 

dk - (-1)^^*- That is, we get the recursion 


\n+l 


^n+l 


^ ~ff(ff+l)c/2 

(n-\- 1)! 


(C.13) 


which is solved by 




” r -I \k 


k=0 


(yy -i:(H)c/2 
kl 


(C.14) 


The mean walk length { is determined by the boundary con¬ 
dition i,o(-oo) = 0. Since decays exponentially to zero unless 
k -0, this condition becomes 

0 = (/)(-oo) = -lH-(^H-l)lima„ (C.15) 

n-~^oo 

which gives Eq. (EHi. 


Appendix D. Small c behavior : formal derivation 

In this appendix, we present a formal derivation of the small 
c behavior reported in Sec. 14.31 In analogy to (1281) we first in¬ 
troduce 


qK{z, cr, c) = COc-fK(z) 


K(z + Icrc) 


K(z) 


1-0. 


(D.l) 


where (x>+\ - a, ai-i = 1 - or, and fxiz) - We also 

introduce ji iteratively as 

X A'+tT/C 

ii-i(yAo-}i-\)dy, (D.2) 

oo 

with i\{x, {cr)i) = qK{x, cti, c). Using ji, we can write 

OO ^OO 

= I (D.3) 

/=! lo-l, 


I O')/ 

where Z|o-|, stands for the summation over all possible cr,’s for 

Since the mean walk distance for c = 0 is e - 1 for any dis¬ 
tribution, (/) should take the form 


<1> = e - 1 -H A{c) 


(D.4) 


with the property that A(c) —> 0 as c —> 0. In this appendix, we 
find the leading behavior of A(c) for small c <sc 1. At first, we 
decompose J/(x, {cr);) = jf\x, (cr)/) + g/(x, (cr);, c), where 

Ji°^ (x, (cr),) = O'® (x, K(x)'^^ 


(i-iy- 

^A(lcrh)~[K(x)y, 


with 


cr) = ^k(z, cr, c = 0) = CL^a-hiz), 

I 

A({fr},) = []cu^„ A({cr)o)=l, 


(D.5) 

(D. 6 ) 

(D.7) 


and gi satisfies the recursion relation 

gl(x, (cr)/, c) =%(x, (cr),, c) 


+ qKix, 


pXA-<TiC 

, cr;, c) I I 

U — oo 


^/-i(u{o')/-i,c)iiy, (D. 8 ) 


with 


klfiix, {cr};, c) = 


A({cr),_i) 


( 1 - 1 )! 


qKix, cr;, c)K{x + cr;c) 


1-1 


- q\ix, cri)Kix) 


i-i 


(D.9) 


Note that 7 ° is the solution of Eq. (ID.21 i for c = 0. Defining k; „ 
recursively as im> 1 ) 


kl,mix, {cr};+;j,, c) 

qKix, CTl+ffi 


m,c) _ r 

,C) “X, 


•A+£7'/+,„C 


ki,m-iiy, {cr};+,„_i, c)dy, (D. 10) 
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we can formally write 


z-i 


gi 


(X, {cr),, c) = ^ kj^m,m {x, {cr)/, c) . (D.l 1) 


If we define 

X oo 

^/,m (.X, {rr}/+m, c) zix, 

DO 

we can write 


(D.12) 


ZEE ^/,m ({*^}/+ni5 5 (D.13) 

l=l m=0 |a-| 

where A(c) is defined in Eq. (ID.4b . Since, for any x and cr, 


f 


hm(y, {o'}l+m, C)dy 


r 


kl,m(y, (cr)/ c)dy\ 


(D.14) 


we get an inequality for any I and m such as 

X OO 

qK{x,cri+,„,c)dx\ai^m-\\ = \ai,m-\\ < |a/,o|, (D.15) 

oo 

which shows that - O (ai o). 

To extract the leading behavior of /1(c), we consider the 
derivative of with respect to c. That is, we consider 


oc 


-r 


ki^mix, {cr)/+„„ c)dx. 


(D.16) 


where ki^m is defined as 


ki,m(xAo-}l+m,c) = —A:/,m(x,{cr},+,„,c). 
OC 


(D.17) 


Notice that 


dA(c) 

dc 


CO oo 


EEE bl,m ({cr), +m 5 c) . (D.18) 


/=1 m=0 {cr} 


If bim - 0 {c’’) with 77 < 1 for all m, we can write ki^m as the 
sum of and Ri m which have the property that 

X OO 

Kl,m(y, (cr), +m^ c)dy - 0 {c^), 

OO 

X oo 

RUyA<x)i^m,c)dy ^ o{c^). (D.19) 

oo 

Now we will find a recursion relation for a:, from the exact 


relation, ki^^ix, {cr]i+m, c) = Eti where 

h ^ q%{x,Cri+m) I Ki^m-\iy,{o-}l+m-l,c)dy, 

\J —OO 


h 
h - 


= q%{x, cri+,„) I 

V/—0< 


Ri,m-i{y,{o-]i+m-\,c)dy, 


X X-¥(Ti+mC 

h,m-iiy,{cr}i+m-\,c)dy 

X x+(ri+„,c 

ki^m-\(y,{cr}i+m-\,c)dy, 

OO 

^5 ~ ^l+mt ■> 

rX+O-J+mC ^ 

h = [qK - q\\ I h,m-i(y, {cr]i+m-i,c)dy, 

\J —oo 


and 


I 1 ^(T 


q]^{x, cr, c) = -^qK(x, cr, c) 

T ^fK(x)fK(x + 2 o-c) lK(x + 2 a-c)y 

= 2"»(1-") [ KU) I ■ 

Since Ij,, I 4 , Is, h are zero if c = 0 (note that k^m is zero if 
c = 0 ), the integrals over these functions are o(l) and they 
contribute to the remainder terms Since the integral of 

I 2 should be o(c^), we find Ri ,„ - 


/f,,,„(x, {cr),+,„, c) = q'^j^ix, ai+m) I {cr}i+m-\,c)dy. 

k ) —00 

(D.21) 

In fact, the above consideration reveals that if we choose /?,,o 
such that the integral of Ri o(y) is o(l), the analysis of ki „, should 
give all terms up to (9(1). 

If we define 


0(X) 


0000 

= EEE f 

Z=1 m=0|o-l,+„. 


^l.m(y, {cr)z c)dy, (D.22) 


we can write a differential equation for 0 (x) such as 

d4>{x) 


dx 


-■ fK{x)(p{x) +xix, c) 


(D.23) 


where 


X(x, c) = EE Ki,o(x, (cr),, c), (D.24) 


Z=1 {cr|, 


and we have used Eo- q^K^x, cr) = /k^x). The solution of 
Eq. (ID.23b is 


^(x) = r g ^^^(y, c)dy, 

%J — OO 


(D.25) 


which is related to A(c) as 


dA(c) 

dc 


■f 


lim ^(x) -el e ^^\(y, c)dy - eif/(c), (D.26) 
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and, in turn, 


{l}^e-l+e f tp(x)dx, (D.27) 

Jo 

where the definition of i/r is clear from the context. 

If we choose /c/,o = ki,Q and Ri q - 0, (p{x) can be used to find 
all terms up to (9(1). With this choice, we get 


We begin with the analysis of N\{cr,c). Since the support of 
/a:(x) in question is x < 0, we write A^i(cr, c) as 


N\((t,c) 
2 crv^a(l — a) 


L 

f^OO 

I dy \y(y + 2c)] e~^"^ 
Jo 


dx[x(x + 2crc)]''-' 

I (y+2i/(cr)c,cr,c) 

(E.IO) 


c) = ^ 


zz k,fi(x, {crji, c) 

Ml {(Tl, 

= Xi cr, c)fK(x + CTC)] e 


(D.28) 

,K{x^(tc) 


where ii{cr) - max( 0 , cr), we have changed variables to y = 
-X - 2u{cr)c to get the second line, and 

^Ii(z, cr, c) = z'' + (1 - Wo-) [(z - 2 crc)'' - z’'] + 

(E.ll) 


Appendix E. Small c behavior : explict formulae 


If we again change variables to z = y/c, the above integral be¬ 
comes 


This appendix is a continuation of [Appendix D| and presents 
the explicit small c behavior for the various classes. We first 
assume that i/^(0) is non-zero. If this is true, the mean walk 
distance becomes 

<Z) = e - 1 H-eciA(O) H-o(c). (E.l) 

Since = 0) = 0 and, in turn, ;tf(x, 0) = (2Qr - 

^)fK{xf exp[A(x)], we get 

X co 

Mxfdx, (E.2) 

oo 

and, accordingly. 


2a-vMl-g) ‘X W^ + 2)]’'-'c-f-(^’‘^'^>c/y, (E.12) 

where ^i 2 (z, cr, c) = ^ii(cz-i- 2 cM(cr), cr, c), which is zero if c = 0 . 
Thus, the leading behavior of N\ is 

2vM\-a) " X ^ ^ 

where the integral is finite ifv< 5 . Thus, Ai(l,c)-i-Ai(-l,c) = 
c,(g 2 y-i) y strictly smaller than i. 

When V - the integral in Eq. (IE.131 I is not defined, which 
requires different approach for this case. To extract the leading 
behavior, we performed integration by parts such that 


</) = c-l+cc(2cr-l) r /^(x)"+o(c), (E.3) 

U — (X) 

as long as the integral is finite. As advertised, this generalizes 
Eq. (I54I 1 to a < 1 . Eor the Gumbel and Erechet classes and for 
the Weibull class with v > 5 , the integral becomes 


r 


4 


(E.4) 


/ 


X CO 

ju2jc-2(^+i)e-2r-''^^ = (2 (E.5) 

= v2^2+‘/T (2 - V"') . (E.6) 


Eor the Weibull class with v , Eq. (IE. 6 I 1 is not applicable 
and we have to be more careful to find the small c behavior of 
i/c(c) for this case. To this end, we first write 


iA(c) = ^ [Ai(cr, c) + A 2 (cr, c)], 


where 


(E.7) 


(E. 8 ) 


X oo 

qjj-(x, cr, 

CO 

X co 

qK(x, cr, c)f(x + (E.9) 

OO 


^- 21 n(V^) 

2crv2a(l—a) ' ' 

+ 2^ c/yln(v5^-H 

r“ , . 

= -ln(2c) + 2 —-ln (2 VIlc/xH-oCl) 

Jo 2 ^/x ^ ’ 

=-lncH-ln2-2y-Ho(l), (E.14) 

where we have used {S > 0) 

’ (E.15) 


In ( -H Vx H- .S) 
ax ^ ’ 


2 y/x(x -H S) 


and y » 0.5771 is the Euler-Mascheroni constant. Still, 
Ai(l,c) -H Ai(-1,c) = o(l) even for v - Hence, Ni does 
not contribute to the leading behavior of i/c(c). 

Now we move on to the analysis of A 2 . We first write N 2 as 

r ' \x(x-H crc)]'-' 

—CO 


crcoo-y^ 


f^OO 

= [y(y-H c)]’'^* (E.16) 

Jo 

where we have made a change of variables y = -x - u(cr)c and 

^21 (z, cr, c) =(1 - oJa-) [z*' -I- In K(-Z + 2crc)] 

+ z'' + (z-o-cy + e^^'' - (E. 17) 
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Note that In K(z) = 0 if z > 0. As above, the change of variables 
to z = y/c gives 

r dz [z(z + 1)]’'^* (E 18 ) 

CTiOa-V^ Jo 

where ^ 22 ( 1 , o-,c) = ^21 (& + cu(o-),cr,c) with the property 
^ 22 ( 2 , cr, 0) = 0. Thus the leading behavior of N 2 is 


N2(cr, c) ^ (TOJa-V C' 


.2~2v-l 


r 


dz [z{z + 1)] 


v-l 


2.2v-ir(l-2y)r(y) 

= r(i-v) ’ 


(E.19) 


which is valid for v < i. For v = |, integrating by parts gives 
^ 2 ( 0 -, c) 


= - Inc 


(TOJcrV- 
+ 2 


Jy In [ Vy + y/y + ^j ^ 

ln(2Vy) 


= - In c + 2 


r 


dy- 




e^^'^ + o(l) 


= - Inc - 2y + o(l). 


(E.20) 


Since wi - tu_i = 2a - 1, we finally get the leading behavior of 
tf/(c) for V < i as 


f 


2-2v-ir(i-2v)r(v) 

i/^(c) (2a - l)v c —— --—, 

r(l - v) 

2,r(i-2v)r(v + i) 


ij/{x)dx — (2a — l)c 


2r(l-y) 


and for v = ^ as 


He) ■■ 

I il/{x)dx ■ 

Jo 


2a- 1 
4 

2a- 1 


(- In c - 2y), 
cln(e^’’"'c). 


(E.21) 


(E.22) 
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